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THE IMPLEMENTATION OF TRICK-PLAY MODES FOR PRE- 
~~ ENCODED VIDEO 

Introduction 

The implementation of trick-play modes within digital 
video systems is a problem which is becoming more important as 
digital video-based systems enter the marketplace. Video CDs, video 
on demand (VOD) and other similar systems are starting to emerge as 
new future consumer products; and many of these products are 
looking to compete with the VHS tape market as providers of feature- 
length movies to the consumer. However, unlike existing analog-based 
systems, digital video systems pose more of a challenge when it comes 
to providing trick-play modes (fast-forward, fast-reverse, freeze- 
frame, etc). 

MPEG is fast becoming the standard digital compression 
format for the storage and transmission of digital video material. 
Unfortunately, the processing required to produce a trick-play stream 
at, for example 10 to 30 times normal speed, from a single normal- 
speed MPEG video stream is relatively complex and expensive. As a 
result, a method has been developed which provides trick-play modes 
by the use of separate video streams for various fast-forward and 
fast-reverse speeds. These various video streams are switched 
between when the user desires to implement these modes. This trick- 
play method been developed based on MPEG-1 and MPEG-2 encoded 
video material. However, this trick-play method may be applied to 
any digital or analog video system that require the implementation of 
trick-play modes, 

Overview 

The basis of this method is that separate video streams are 
employed to provide different trick-play modes, A single stream is 
used for normal play and then other streams are used to provide a 
variety of fast-forward and fast-reverse modes. The image streams 
which provide the trick-play feature may not be encoded at the same 
bit-rate, and may not have the same resolution as the original image 
stream. The use of a significantly lower bit-rate and/or resolution for 
encoding trick-play image streams may offer savings benefits when 
storage space and/or transmission costs are considered. In addition, 
human visual perception may also allow the reduction of resolution 
during trick-play video browsing. 

Description 

As already mentioned, sections of this method may be 
applied to various forms of video material (analog or digital and 
encoded in a variety of ways). However, in the remainder of this 
description, it will be assumed that the trick-play streams are encoded 
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in an MPECLformat- During this description of the method, the 
following parameters will also be assumed: 

- There is a single normal-play (normal speed) MPEG video stream, 

- There are two required fast-forward streams, 

(7x and 21 x normal speed). 

- There are two required fast-reverse streams 

(negative 7x and 21x normal speed). 

Note: This is only one example of a possible configuration and will 
suffice as an example for explanation purposes. However, this method 
may be applied equally effectively to an infinite number of 
configurations. 

For the above configuration, five separate MPEG-encoded 
streams are required. These streams are completely independent and 
may be of varying bit-rates and/or varying display resolutions. For 
example, one possible trade-off between quality and efficiency is 
illustrated in Table 1 below. In this case, the trick-play streams 
employ a lower resolution (352 x 480 pixels) and a lower bit-rate (1.5 
Mbps) than the normal-play stream (704 x 480 at 4.0 Mbps). Such a 
trade-off is very reasonable since very high spatial picture quality 
may not be required for trick-play material. These trade offs resultin 
more efficient storage utilization. The total storage overhead (extra 
storage capacity required to store all forward and reverse trick-play 
streams in addition to the normal-play stream) is still less than 15% 
(the overhead can be calculated by summing the bit-rate over the 
playing speed, and for this particular example is equal to 14.3% of the 
space required for the normal play stream). 



STREAM 


BIT-RATE (Mbps) 


RESOLUTION 


Normal-play 


4.00 Mbps 


704x480 


7x forward 


1.50 Mbps 


352x240 


21 x forward 


1.50 Mbps 


352x240 


7x reverse 


1.50 Mbps 


352x240 


21x reverse 


1.50 Mbps 


352x240 



Table 1: An example of distributing bit-rate and resolution changes 
among trick-play streams. 



As the video material is played back from the video server to the 
decoder, the server would switch between the various streams 
responsive to instructions from the user. For example, if the user 
chose to fast-forward through the material at the highest speed, then 
the server would jump from within the normal-play stream to the 
appropriate point within the 21 x fast-forward stream and continue 
playing. Each of the trick -play streams (as well as the normal-play 
stream) would require a relatively uniform and short group of 
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pictures (GOP) size of, for example, half a second. This would allow a 
maximum of 0.25 seconds of visual continuity error when switching 
decoding from one bit-stream to another. 

An important part of the overall system is the method for 
determining switching entry points between the different image 
streams. For example, during "playback 11 of one stream a user may 
wish to switch to another stream. This switch requires calculation of 
the exact location in the new stream (to a byte accurate level) that the 
decoder should begin to play from. The very basic steps to determine 
this "entry point" are as follows: 

1. Determine the current byte offset (and hence the current 

frame) in the current file. 

2. Determine the new frame to switch to in the new file. 

3. Determine the byte offset in the new file.. 

Step 2 is complicated by the fact that, for MPEG streams, the entry 
points into a new stream are limited to those points where a 
sequence_header exists. This is typically at the beginning of a group 
of pictures (GOP). It is further complicated by the fact that the 
duration (in real display time) of a GOP is not always constant (even if 
the number of frames in a GOP is constant). This is because it is 
possible to repeat fields (or frames) in an MPEG sequence, which 
effectively means that more final 'displayed' frames can be produced 
by a single GOP than there are coded 'pictures* in the GOP. 

An example of stream switching is illustrated in FIG. 1. In 
this case, the original or normal speed stream is playing, and two Trick 
Play streams are provided at 2x and lOx normal speed. The Trick 
Play speeds of 2 and 10 times are selected for illustration simplicity. 
At the instant of Trick Play selection (switching time) the normal play 
stream is at frame number 20. Possible entry points into each of the 
three image streams are determined by sequence headers which are 
indicated by the darkened frames. The "best fit" frames which can be 
switched to are indicated by the arrowhead line which links possible 
entry points into the various video streams. The "ideal" or desired 
entry points, in terms of the users visual continuity, are indicated by 
horizontally shaded frames. Note that these "ideal" points are not 
necessarily calculated simply from (current frame in normal 
sequence)/(trickplay stream speed) due to the complications described 
above. In each case, the actual frame switched to is a "best fit" 
possible entry frame which is closest in time to the users desired or 
"ideal" frame. 



RCA 87865/4401 



FRAME 20 




FIG.1 CHANGING BETWEEN NORMAL 

& TRICK PLAY MPEG VIDEO STREAMS 

From the illustration in FIG. 1, the decision of which frame to switch to 
appears to be obvious. However, from an algorithmic point of view 
this is far from trivial. An important part of the overall system is the 
method of determining the switching points between the different 
streams. To accomplish this function, a look-up table (LUT) is 
employed. The functionality and arrangement of this table is 
described below. Table 2 describes the general layout of the LUT. 
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[number__of_tabiesj 

[Table_number] {filename} <bit_rate Mbps> [num_gops] 

[num^frames] [gop.size] [lst_gop_size] [speed] 
[gop number] [file byte offset] 
[gop number] (file byte offset] 
[gop number] [file byte offset] 

Repeated [num_gops] times 

[gop number] [file byte offset] 
[gop number] [file byte offset] 
[gop number] [file byte offset] 

Repeat all of above (except for the first line) 
[number_of_tables] times . 4 - 



Table 2: Look-up Table layout 

The items in Table 2 are described below: 

[J denotes an integer value 

<> denotes a floating point value 

{} denotes a text string 

[number jof jtables] 

The number of look-up tables in the file (same as the number of 
bitstreams). In most cases there is one normal play stream, one 7x 
stream, one ~7x stream, one 2lx stream and one -2Ix stream. 
Therefore [number jofjables) would be 5. 

[Table ^number] 

This is a number which is associated with the ordering of the 
streams. This number must be between 0 and [number jof Jtables] - 
" 1. [Table jiumber] also shows the order of the streams (from 
fastest reverse to fastest forward). 

{filejiame} 

The name of the muxed MPEG stream. 

<bitjrate> 

The rate (in Mbitsf second) of the muxed MPEG stream (including 
transport layer overhead). * 

[num^gops] 

The number of GOPs in the video stream. 
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/ num_frames] 

The total number of frames (displayed) in the MPEG video stream 
(before any pulldown). 

[gop_size] 

The GOP size (in displayed frames - taking into account 3/2 
pulldown if necessary). 

[lst_gop_size] 

Size (in displayed frames) of the first GOP. Usually this will be 
[gop_size]-N+L 

[speed] 

Speed of the trick-play stream (including sign). Le. -7 for the 
negative 7x stream. 

Such an LUT is stored in the system memory during playback of the 
video material. When the user changes from one speed to another, the 
information in the LUT is used to start decoding the new stream from 
the correct place. The information in the LUT is needed for this 
purpose along with the current offset (in bytes) in the bit-stream 
currently being played. 

To switch streams the current GOP is determined from the 
current file offset. This is accomplished by looking through the LUT to 
find the GOP start point which corresponds to the current offset (see 
Table 2). Once this is known, the new GOP is calculated from the old 
GOP number, given the old and new speeds, GOP size, frame number 
and first GOP size. The appropriate LUT is then used to find the file 
offset (in the new file) corresponding to the calculated new GOP. The 
new stream can then be played starting at this new offset point. The 
relative simplicity of this system results in efficient switching between 
different streams. However, this real time calculation method is based 
on two assumptions, namely that the stream contains GOPs of the same 
size and that material derived from 3:2 pull down telecine not be 
edited to disturb the frame sequence. 

In view of these two potential variables, and without 
knowing in advance exactly how many frames will be produced by a 
GOP, for example when decoding 3:2 pull down material, it may be 
difficult to accurately determine, with real time calculation, exactly 
where to enter a second image stream. Thus the real time calculation 
method has been utilized in a further method which calculates and 
constructs off line, i.e. not in real time, multiple look-up tables (LUT) 
which contain all possible entry points into die various play and trick 
play streams. Thus complete 'time-maps 1 of the new streams are 
available (since even if the current "real-time 1 * frame number is 
known, you cannot calculate which picture number in the new stream 
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corresponds to the same point in time). In addition to thk practical 
problem, it is also advantageous to allow a user to fine tune or modify, 
the stream switching delay and accuracy independently from the 
actual switching software. For example a user may, in the interest of 
continuity of entertainment, opt to always join the new image stream 
1/2 or 1 second prior to the departure point in the fist stream. Hfence 
.the software never requires modification even when a switching 
scheduling change is necessary. For these reasons, a generic LUT 
format has been developed which allows the entry point calculation 
and tuning of stream switching delays to be done independently from 
the software which employs the tables. A conceptual illustration of 2 
LUT sets is shown in FIG. 2 for transitions from play speed and 7 times 
play speed. Similar sets of tables are required for transitions from, * 
21 X, -7X and -21X trick play speeds. 



LOOKUP TABLES 
FOR TRANSITION 
FROM PLAY SPEED 



1XT0 7X 



1XT0 21X 



IX TO -7X 



1XTO -21X 



LOOKUP TABLES 
FOR TRANSITION 
FROM 7X 
PLAY SPEED 



7X TO IX 



7X TO -7X 



7X TO 21 X 



7X TO -21 X 



FIG. 2 TRICK PLAY LOOKUP TABLES 

The basis of the LUT-based switching method is as follows: In a 
system with N streams, comprising a normal play stream and various 
trickplay streams, it is desirable to provide the ability to switch from 
any stream to any other stream. Hence for each stream N-l tables of 
(byte-offset, byte-offset) pairs are required. The first offset in the 
pair corresponds to the point or location being viewed in the current 
stream. The second offset refers to the same point in time (program 
location) in the stream to be switched to. The general layout of these 
LUTs is illustrated in Table 2, where ["from" byte offset N] is the offset 
(in bytes) location in the current file (video stream) to be switched 
from, ["to" byte offset N] is the offset (in bytes) where decoding should 
start in the new file (video stream) that is being switched to, and 
[num_pairs] is the number of pairs of switching coordinates in the file. 
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The number of pairs of number in each table depends only on the 
required precision when switching streams (fewer pairs save storage 
space but provide less accuracy). However, the upper limit for 
accuracy is still governed by the number of GOPs (and hence, the 
number of possible entry points) in the stream being switched to 



["from" byte offset 1] ["to" byte offset 1] 

["from" byte offset 2] ["to" byte offset 2] 

["from" byte offset 3] ["to" byte offset 3] 

Repeated [num__pairs] times 



[7/° m l ? yte offsct <M«U»irs-l>] I"to" byte offset <num pairs-l>] ' 
LI from byte offset <num _paiTS>1 ["to" byte offset <num_pa~irs>1 



Table 2: Look-up Table layout 
Hence for a system employing N video streams, stream SI will use (N- 
1) tables. These tables are used for switching from stream SI to S2 

t 6 Jr S1 l ° S3 CT-^-Sl^ S4 (T_l_4) and Si to SN 

U_UM) For example, if a current image location is at offset 01 in 
stream SI and switching to stream S3 is required, the following 
operation is required: 

1) Find the closest (in time) "from" offset in table S_l_3 

2) Read the corresponding "to" offset from the same table 

3) Start decoding stream S3 from the "to" offset value read. 

The overhead from these tables is still very minor compared to the 
storage space required for the video streams themselves. The simple 

velv r»i USS ° f thCSe taWeS by the contro1 software represents 
LT Jn^, P™ Ce 5. sm8 overhead when switching streams. In addition, 
all control and fine tuning of the switching procedure (accuracy 
fcming, etc.) can be controlled by altering the values and number of 
entries in the tables themselves without requiring access to the 
software (which may not be available to a user). 

Conclusions 

triri^T T 5 iS documem has described methods for implementing 
Xfor^ r? 0 eS i° r pre " encoded vide ° materiri on a variety of media 
aUowTe n'S of servers . a , nd video disk). These methods 

allow the use of separate trick-play streams which may have differing 

T^lT^T/ *T 0 "BinaI encoded "video steam ' 

art based on h/ 'T^f ^ Vari ° US C ° ded video strcams 

are based on the use of look-up table_s (LTJT) which provide an 

efficient, accurate and flexible method of . switching between streams 
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